Selecting discard packets in receiver for voice over packet network

ABSTRACT

A method includes receiving a voice signal in the form of a sequence of data packets, detecting a marker bit in a header of one of the received packets, selecting at least one received packet in response to the detected marker bit, and dropping the selected at least one packet. Other embodiments are described and claimed.

BACKGROUND

In Voice over Packet (VoP) telephony applications, a voice signal istransmitted in the form of data packets at a pre-determined frame rate,such as one packet for every 10 milliseconds. It sometimes is necessaryto discard an occasional packet at the receiver of a VoP connection.This may occur, for example, if the local clock at the receiver isslightly slower than the transmitter clock, or if the jitter delay needsto be reduced due to, e.g., a change in network conditions.

Dropping a packet at the receiver has the potential of causing anaudible glitch in the output audio signal, since the pitch period of theaudio signal is not synchronous with the frame rate. It is thereforedesirable to perform packet drops during periods of silence, when noadverse effect on sound quality will occur.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is block diagram of a VoP network connection according to someembodiments.

FIG. 2 is a flow chart that illustrates a process performed in areceiver that is part of the VoP network connection of FIG. 1.

FIG. 3 is a block diagram of a VoP network connection according to someother embodiments.

FIG. 4 is a flow chart that illustrates a process performed in areceiver that is part of the VoP network connection of FIG. 3.

FIG. 5 is a block diagram of a system that includes a receiver such asthose shown in FIG. 1 or FIG. 3.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a VoP network connection 100 according tosome embodiments. The VoP network connection 100 is formed of a VoPtransmitter 102, a VoP receiver 104, and a data communication channel106 by which a sequence of voice signal data packets are transmittedfrom the transmitter 102 to the receiver 104. The data communicationchannel 106 may be provided, for instance, via a data communicationnetwork such as the Internet. The packets may be formed in accordancewith a standard format (e.g., in accordance with the well-knownRTP—“real time protocol”—used for transport of real time data over theInternet) and include header data as well as data which corresponds tothe audio input signal.

The transmitter 102 may be part of a PSTN-IP (public switched telephonenetwork/Internet protocol) gateway device, which is not shown apart fromthe transmitter 102. The transmitter 102 includes an encoder 108 whichreceives a voice signal in the form of an audio input signal 110 andconverts the audio input signal into a sequence of data packets fortransmission to the receiver 104 via the communication channel 106. Thetransmitter 102 also includes a voice activity detector 112 which alsoreceives the audio input signal 110 and which is coupled to the encoder108, as indicated at 114.

The voice activity detector 112 may operate in accordance withconventional practices to determine when the audio input signal containsvoice activity. Further in accordance with conventional practices, thetransmitter may operate in a silence suppression mode, in which packetsthat contain only silence (i.e., do not contain voice activity) are nottransmitted. In such cases, a voice activity marker bit is set in theheader of the first packet which contains voice after a silence period.The suppression of silence packets and setting of the voice activitymarker bit both may be controlled in response to output from the voiceactivity detector.

In accordance with some embodiments, the voice activity marker bit mayalso be set after periods of silence even when the transmitter is notoperating in the silence suppression mode. Thus, the voice activitymarker bit is set in the first packet containing voice activitytransmitted by the transmitter 102 immediately after transmission of oneor more packets which correspond to a period of silence in the inputaudio signal. As will be seen, the voice activity marker bit may be usedat the receiver 104 in accordance with some embodiments to controlselection of data packets for dropping at the receiver 104.

The receiver may be part of a PSTN-IP gateway device, which is notseparately shown. The receiver 104 includes a jitter buffer 116 which iscoupled to the communication channel 106 to receive and store thesequence of voice signal data packets transmitted from the transmitter102 via the communication channel 106. Thus the jitter buffer 116operates to buffer the sequence of data packets received at the receiver104.

The receiver 104 also includes a jitter delay control circuit 118 whichis coupled to the jitter buffer and operates in accordance with someembodiments in a manner which is described below.

The receiver 104 further includes a decoder 120 which is coupled to thejitter buffer 116. The decoder 120 converts a sequence of voice signaldata packets received by the decoder 120 from the jitter buffer 116 intoan audio output signal 122. The decoder 120 may operate in accordancewith conventional practices.

FIG. 2 is a flow chart that illustrates a process that may be carriedout by the receiver 104 (and particularly by the jitter delay controlcircuit 118) in accordance with some embodiments.

As noted above, it may sometimes be necessary to drop packets at thereceiver 104 because of, e.g., slight clock rate mismatches or a need todecrease the jitter delay at the jitter buffer 116 due to changes innetwork conditions. The process illustrated in FIG. 2 is concerned withselection of packets to be dropped in accordance with some embodiments.As used herein and in the appended claims, “dropping” a packet at thereceiver refers to excluding all or part of the packet from a sequenceof packets used to generate an audio output at the receiver. Forexample, a packet may be dropped by omitting the packet from thesequence of packets read out from the jitter buffer 116 to the decoder120 for conversion into the output audio signal 122.

As indicated at 200 in FIG. 2, the receiver 104 (e.g., the jitter delaycontrol circuit 118) may determine whether it is currently necessary todrop one or more of the received voice signal data packets. If so, thejitter delay control circuit 118 may read at least some of the bits inthe headers of at least some of the packets stored in the jitter buffer116 to determine whether the voice activity marker bit has been set withrespect to one of the packets stored in the jitter buffer 116. Thus, asindicated at 202 in FIG. 2, the jitter delay control circuit 118 maydetect the voice activity marker bit in one of the packets stored in thejitter buffer 116. If need be, the jitter delay control circuit mayoperate to wait for a packet having the voice activity marker bit to bereceived before selecting a packet or packets for dropping.

In response to the detected voice activity marker bit, the jitter delaycontrol circuit 118 may select one or more packets stored in the jitterbuffer 116 which immediately precede in the sequence of received packetsthe packet which has the header in which the voice activity marker bitwas detected. The jitter delay control circuit 118 may then operate sothat the selected packet or packets are dropped. For example, the jitterdelay control circuit 118 may control the jitter buffer 116 so that theselected packet or packets are not read out from the jitter buffer 116to the decoder 120 for conversion into the audio output signal 122. Thusthe decoder 120 generates the audio output signal from the sequence ofdata packets stored in the jitter buffer 116 less the dropped packet orpackets. Selection and dropping of a packet or packets as described inthis paragraph is represented by block 204 in FIG. 2.

Because the voice activity marker bit signifies the beginning of voiceactivity following a period of silence, it may be assumed that a packetor packets which immediately precede the packet having the marker bitcorrespond to a period of silence. Thus selecting such packet or packetsfor dropping is unlikely to result in a glitch in the audio outputsignal, so that the audio quality perceived by a user at the receiverside of the VoP connection 100 may be improved.

In some embodiments, silence detection for the purposes of setting themarker bit at the transmitter 102 may be performed by a transmittercomponent other than the voice activity detector 112 used for silencesuppression. For example, the transmitter 102 may include an automaticgain control (AGC) circuit (not shown) which may provide an energycalculation for the input audio, and the energy calculation may be usedto distinguish between silence and speech. Alternatively, a codec (notseparately shown) of which the encoder 108 is a part may have acapability to detect silence, and this capability may be used to set thevoice activity marker bit.

In some embodiments, the voice activity marker bit may be set only afterdetecting a period of silence that is sufficiently long to supportdropping of packets at the receiver.

It will be appreciated that the transmitter 102 may be part of a device(not separately shown) which also includes a receiver like the receiver104, and the receiver 104 may be part of a device (not separately shown)which includes a transmitter like the transmitter 102, so that a two-waytelephone connection may be made.

FIG. 3 is a block diagram of a VoP network connection 300 according tosome other embodiments. The VoP network connection 300 is formed of aVoP transmitter 302, a VoP receiver 304 and a data communication channel(which is again indicated by reference numeral 106, since it may be thesame as the data communication channel discussed in connection with FIG.1). As before, a sequence of voice signal data packets may betransmitted via the data communication channel 106 from the transmitter302 to the receiver 304, and the packets may be in the RTP format, forexample.

Again the transmitter 302 may be part of a PSTN-IP gateway device (notseparately shown). The transmitter 302 includes an encoder 306 which maybe like the encoder described in connection with FIG. 1. The transmittermay, but need not, include a voice activity detector, which is not shownin FIG. 3. The transmitter may operate entirely in accordance withconventional practices.

The receiver 304 may also be part of a PSTN-IP gateway device, which isnot separately shown. The receiver 304 includes a jitter buffer 308,which may be like the jitter buffer described in connection with FIG. 1.The receiver 304 also includes a jitter delay control circuit 310 whichis coupled to the jitter buffer 308 and operates in accordance with someembodiments in a manner which is described below. The receiver 304further includes a decoder 312 which is coupled to the jitter buffer 308and which may be like the decoder described in connection with FIG. 1.

In addition, the receiver 304 may include an automatic level control(ALC) component 314 which is coupled to the decoder 312 and to thejitter delay control circuit 310. The ALC component 314 may include alevel estimation and active voice detector block 316 which is coupled tothe decoder 312 to receive and analyze the output audio signal 318generated by the decoder 312. As will be seen, the level estimation andactive voice detector block 316 may also be coupled to the jitter delaycontrol circuit 310 to provide an output to the jitter delay controlcircuit 310.

In addition, the ALC component 314 may include a gain adjustment block320 which is coupled to the output of the decoder 318 and is controlledby the level estimation and active voice detector block 316. Inaccordance with conventional practices, the level estimation and activevoice detector block 316 receives and analyzes the output audio signal318 produced by the decoder 312 to determine the signal amplitude leveland also to detect the presence of speech in the output audio signal318. The level estimation and active voice detector block 316 controlsthe gain adjustment block 320 to increase the gain when the level of theaudio output signal 318 is low and to decrease the gain when the levelof the audio output signal 318 is high, but the gain adjustment may betemporarily disabled when speech is not present. Disabling of the gainadjustment in this case may prevent noise from being amplified.

In accordance with some embodiments, the jitter delay control circuit310 may receive from the level estimation and active voice detectorblock 316 a signal or signals indicative of characteristics of thepacket currently being decoded (as reflected by the output of thedecoder 312). The characteristics, which may be detected by the levelestimation and active voice detector block 316, may include whether thepacket is a speech packet or a silence packet, whether the packet is alow level noise packet, or whether the packet is a low level speechpacket (i.e., speech at a volume level that is below a predeterminedthreshold). In other embodiments, the level estimation and active voicedetector block 316 may only indicate to the jitter delay control circuit310 that the current packet is a silence packet, or may only indicate tothe jitter delay control circuit 310 that the current packet is asilence packet or a low level noise packet. In response to the packetcharacteristic detected by the level estimation and active voicedetector block 316, the jitter delay control circuit 310 may determinewhether or not to drop the current packet.

FIG. 4 is a flow chart that illustrates a process that may be carriedout by the receiver 304 (and particularly by the jitter delay controlcircuit 310) in accordance with some embodiments.

As indicated at 400 in FIG. 4, the receiver 304 (e.g., the jitter delaycontrol circuit 310) may determine whether it is currently necessary todrop one or more of the received voice signal data packets. If so, itmay next be determined at 402 whether dropping of a packet is urgentlyneeded. For example, the determination at 402 may include determiningwhether a pre-determined period has timed out, without dropping of apacket, since it was determined at 400 that dropping a packet wasnecessary.

If a negative determination is made at 402, then a determinationindicated at 404 is made. At 404 it is determined whether the levelestimation and active voice detector block 316 is signaling that thecurrent packet is a silence packet or a low level noise packet. If apositive determination is made at 404 (i.e., if the current packet iseither a silence packet or a low level noise packet), then the jitterdelay control circuit 310 may operate so that the current packet isdropped (as indicated at 406). For example, the jitter delay controlcircuit 310 may control the jitter buffer 308 so that the next packetafter the current packet is immediately read out from the jitter buffer308 to the decoder 312. This may effectively overwrite the currentpacket, thereby causing, potentially, most of the current packet not tobe decoded.

After dropping the current packet, the process of FIG. 4 may loop backto 400.

If a negative determination is made at 404 (i.e., if it is determinedthat the current packet is neither a silence packet nor a low levelnoise packet), then the jitter delay control circuit 310 may wait forthe next packet, as indicated at 408, and the process of FIG. 4 mayeffectively loop back to 402.

If a positive determination is made at 402 (i.e., if the need to drop apacket is urgent), then a determination indicated at 410 may be made. At410 it is determined whether the level estimation and active voicedetector block 316 is signaling that the current packet is any one of asilence packet, a low level noise packet, or a low level speech packet.If a positive determination is made at 410, then the current packet maybe dropped per 406, and the process of FIG. 4 may loop back to 400. If anegative determination is made at 410 (i.e., if it is determined thatthe current packet is neither a silence packet, nor a low level noisepacket, nor a low level speech packet), then the jitter delay controlcircuit 310 may wait for the next packet, as indicated at 412, and theprocess of FIG. 4 may loop back to 410.

By selecting silence packets or other suitable packets for dropping, thereceiver 304 may operate so as to minimize or eliminate glitches in theoutput audio signal, thereby improving the perceived audio qualityprovided by the receiver.

Again, it will be appreciated that the transmitter 302 may be part of adevice (not separately shown) which also includes a receiver like thereceiver 304, and the receiver 304 may be part of a device (notseparately shown) which includes a transmitter like the transmitter 302,so that a two-way telephone connection may be made.

In some embodiments, a VoP receiver may be provided which is able toselect packets for dropping both in accordance with the techniquedescribed with reference to FIGS. 1 and 2 and in accordance with thetechnique described with reference to FIGS. 3 and 4. Such a receiver mayswitch between the two techniques. For example, the receiver may use thevoice activity marker bit detection technique when the transmitter isable to provide the required voice activity marker bits, but may switchover to use the ALC-based technique in cases where the transmitter isnot able to provide the required voice activity marker bits.

FIG. 5 is a block diagram of a system 500 that includes a receiver 104or a receiver 304 such as those described above in connection with FIGS.1-4. In the system 500, the output of the receiver 104 or 304 is coupledto drive a speaker 502 to audibly reproduce the audio output signal. Itwill be appreciated that buffers, amplifiers and the like, though notshown in the drawing, may be present between the receiver 104 or 304 andthe speaker 502.

The techniques disclosed herein for selecting packets to be discardedmay be applied in any IP (Internet Protocol) voice communication device,including, but not limited to, media gateways, IP telephones, mediaservers, Wi-Fi telephones and any device that uses RTP (Real-TimeProtocol) for voice communication.

In at least some of the embodiments described above, at least somecomponents (e.g., the jitter delay control circuit 118, FIG. 1) may beconstituted by one or more general purpose processors (not separatelyshown) and/or digital signal processors (not separately shown) coupledto one or more program memories and or other storage devices. Theprogram storage device(s) may store software and/or firmwareinstructions that control the processor(s) to perform theabove-described functions of the components in question.

The several embodiments described herein are solely for the purpose ofillustration. The various features described herein need not all be usedtogether, and any one or more of those features may be incorporated in asingle embodiment. Therefore, persons skilled in the art will recognizefrom this description that other embodiments may be practiced withvarious modifications and alterations.

1. A method comprising: receiving a voice signal in the form of asequence of data packets; detecting a marker bit in a header of one ofthe received packets; selecting at least one received packet in responseto the detected marker bit; and dropping the selected at least onepacket.
 2. The method of claim 1, wherein the selected at least onereceived packet precedes in the sequence of data packets the packethaving the header in which the marker bit is detected.
 3. The method ofclaim 1, wherein the marker bit is a voice activity marker bit.
 4. Themethod of claim 1, further comprising: buffering the received sequenceof data packets.
 5. The method of claim 1, further comprising:generating an audio output from the received sequence of data packetsless the dropped at least one packet.
 6. An apparatus comprising: abuffer to receive and store a sequence of data packets that represents avoice signal; and a jitter delay control circuit coupled to the bufferto: detect a marker bit in a header of one of the packets stored in thebuffer; select at least one other of the packets stored in the buffer inresponse to detecting the marker bit; and drop the selected at least oneother packet.
 7. The apparatus of claim 6, wherein the selected at leastone other of the packets precedes in the sequence of data packets thepacket having the header in which the marker bit is detected.
 8. Theapparatus of claim 6, wherein the marker bit is a voice activity markerbit.
 9. The apparatus of claim 6, further comprising: a decoder coupledto the buffer to generate an audio output from the sequence of datapackets stored in the buffer less the dropped at least one other packet.10. A system comprising: a receiver to receive a sequence of datapackets that represents a voice signal; and a speaker coupled to thereceiver to audibly reproduce the voice signal; wherein the receiverincludes: a buffer to receive and store the sequence of data packets;and a jitter delay control circuit coupled to the buffer to: detect amarker bit in a header of one of the packets stored in the buffer;select at least one other of the packets stored in the buffer inresponse to detecting the marker bit; and drop the selected at least oneother packet.
 11. The system of claim 10, wherein the selected at leastone other of the packets precedes in the sequence of data packets thepacket having the header in which the marker bit is detected.
 12. Thesystem of claim 10, wherein the marker bit is a voice activity markerbit.
 13. The system of claim 10, wherein the receiver further includes:a decoder coupled to the buffer to generate an audio output from thesequence of data packets stored in the buffer less the dropped at leastone other packet.
 14. A method comprising: receiving at a receiver avoice signal in the form of a sequence of data packets; using anautomatic level control component of the receiver to detect acharacteristic of one of the received packets; and dropping the one ofthe received packets in response to the detected characteristic.
 15. Themethod of claim 14, wherein the detected characteristic is that the oneof the received packets is a silence packet.
 16. The method of claim 14,wherein the detected characteristic is that the one of the receivedpackets is a noise packet.
 17. The method of claim 14, wherein thedetected characteristic is that the one of the received packets is aspeech signal packet at a volume level below a predetermined threshold.18. The method of claim 14, further comprising: generating an audiooutput from the received sequence of data packets less the dropped oneof the packets.
 19. An apparatus comprising: a buffer to receive andstore a sequence of data packets that represents a voice signal; adecoder coupled to the buffer to receive the sequence of data packetsand to convert the sequence of data packets to an audio output signal;an automatic level control (ALC) component coupled to the decoder to:selectively apply a gain adjustment to the audio output signal; anddetect a characteristic of one of the received packets; and a jitterdelay control circuit coupled to the ALC component and to the buffer toselectively drop the one of the received packets in response to thecharacteristic detected by the ALC component.
 20. The apparatus of claim19, wherein the detected characteristic is that the one of the receivedpackets is a silence packet.
 21. The apparatus of claim 19, wherein thedetected characteristic is that the one of the received packets is anoise packet.
 22. The apparatus of claim 19, wherein the detectedcharacteristic is that the one of the received packets is a speechsignal packet at a volume level below a predetermined threshold.
 23. Asystem comprising: a receiver to receive a sequence of data packets thatrepresents a voice signal; and a speaker coupled to the receiver toaudibly reproduce the voice signal; wherein the receiver includes: abuffer to receive and store the sequence of data packets; and a decodercoupled to the buffer to receive the sequence of data packets and toconvert the sequence of data packets to an audio output signal; anautomatic level control (ALC) component coupled to the decoder to:selectively apply a gain adjustment to the audio output signal; anddetect a characteristic of one of the received packets; and a jitterdelay control circuit coupled to the ALC component and to the buffer toselectively drop the one of the received packets in response to thecharacteristic detected by the ALC component.
 24. The system of claim23, wherein the detected characteristic is that the one of the receivedpackets is a silence packet.
 25. The system of claim 23, wherein thedetected characteristic is that the one of the received packets is anoise packet.
 26. The system of claim 23, wherein the detectedcharacteristic is that the one of the received packets is a speechsignal packet at a volume level below a predetermined threshold.
 27. Anapparatus comprising: a storage medium having stored thereoninstructions that when executed by a machine result in the following:receiving a voice signal in the form of a sequence of data packets;detecting a marker bit in a header of one of the received packets;selecting at least one received packet in response to the detectedmarker bit; and dropping the selected at least one packet.
 28. Theapparatus of claim 27, wherein the selected at least one received packetprecedes in the sequence of data packets the packet having the header inwhich the marker bit is detected.
 29. The apparatus of claim 27, whereinthe marker bit is a voice activity marker bit.